Plug&Join: An easy-to-use Generic Algorithm for Efficiently Processing Equi and Non-Equi Joins

نویسندگان

  • Jochen Van den Bercken
  • Martin Schneider
  • Bernhard Seeger
چکیده

This paper presents Plug&Join, a new generic algorithm for efficiently processing a broad class of different types of joins in an extensible database system. Plug&Join is not only designed to support equi joins, temporal joins, spatial joins, subset joins and other types of joins, but in contrast to previous algorithms it can be easily customized and it allows efficient processing of new types of joins that might be of relevance in the near future. Depending on the join predicate (and the data types of the join relations) Plug&Join is called with a suitable type of index structure as a parameter. Fortunately, custom types of index structures can be implemented easily under frameworks like GiST which simplifies and extends the applicability of our approach. Plug&Join partitions both join relations recursively until each partition of the inner relation fits in main memory. If an inner partition fits in memory, the algorithm builds a memory resident index of the desired type on the inner partition and probes all tupels of the corresponding outer partition against the index. Otherwise, a memory resident index is created by sampling the inner partition. The index is then used as a partitioning function for both partitions. In order to demonstrate the flexibility of Plug&Join, we present how to implement equi joins, spatial joins and subset joins by using memory resident B+-trees, R-trees and S-trees, respectively. Moreover, results obtained from different experiments for the spatial join show that Plug&Join is competitive to special-purpose methods like the Partition Based Spatial-Merge Join algorithm (PBSM).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Auto-Join: Joining Tables by Leveraging Transformations

Traditional equi-join relies solely on string equality comparisons to perform joins. However, in scenarios such as adhoc data analysis in spreadsheets, users increasingly need to join tables whose join-columns are from the same semantic domain but use different textual representations, for which transformations are needed before equi-join can be performed. We developed Auto-Join, a system that ...

متن کامل

Implementation of Parallel Collection Equi-Join Using MPI

One of the collection joins types in Object Oriented Database (OODB) is collection equi-join. The main feature of collection joins is that they involve collection types. In this paper we present our experience in implementing collection equi-join algorithms by using Message Passing Interface (MPI). In particular, it layouts the fundamental techniques that are used in the implementation and that...

متن کامل

Optimisation of Partitioned Termporal Joins

Joins are the most expensive and performance-critical operations in relational database systems. In this thesis, we investigate processing techniques for joins that are based on a temporal intersection condition. Intuitively, such joins are used whenever one wants to match data from two or more relations that is valid at the same time. This work is divided into two parts. First, we analyse tech...

متن کامل

Parallel Collection Equi-Join Algorithms for Object-Oriented Databases

One of the differences between relational and objectoriented databases (OODB) is that attributes in OODB can be of a collection type (e.g. sets, lists, arrays, bags) as well as a simple type (e.g. integer, string). Consequently, explicit join queries in OODB may be based on collection attributes. One form of collection join queries in OODB is “collection-equi join queries”, where the joins are ...

متن کامل

Heuristic joins to integrate structured heterogeneous data

Heterogeneous data sources often exhibit semantic heterogeneity at the data level; that is, the same entity in the world is referred to in different ways both within and across sources. This paper discusses a framework for combining information from such sources, called heuristic join, that is an extension of the familiar equi-join for homogeneous sources. Heuristic join uses heuristic match op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000